A Novel Parts of Speech (POS) Tagset for morphological, syntactic and lexical annotations of Saraiki language

نویسندگان

چکیده

One of the important resources required for various Natural Language Processing (NLP) applications like machine translation, information retrieval and text mining, is annotated corpora. Text corpora annotation process requires parts speech (POS) tags to mark different with grammatical annotations in order identify linguistic properties a word, sentence or discourse. The marking items based on two main features 1) category 2) context (word, discourse) i.e. relationship adjacent related text. Saraiki being one oldest languages still resource scarce language recorded literature as well computational context. According our study, at present, there no tagset defined language. This work presents first hierarchical POS (MPOST) tag set which designed be used morphological, syntactic lexical

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

developing a pattern based on speech acts and language functions for developing materials for the course “ the study of islamic texts translation”

هدف پژوهش حاضر ارائه ی الگویی بر اساس کنش گفتار و کارکرد زبان برای تدوین مطالب درس "بررسی آثار ترجمه شده ی اسلامی" می باشد. در الگوی جدید، جهت تدوین مطالب بهتر و جذاب تر، بر خلاف کتاب-های موجود، از مدل های سطوح گفتارِ آستین (1962)، گروه بندی عملکردهای گفتارِ سرل (1976) و کارکرد زبانیِ هالیدی (1978) بهره جسته شده است. برای این منظور، 57 آیه ی شریفه، به صورت تصادفی از بخش-های مختلف قرآن انتخاب گردید...

15 صفحه اول

wuthering heights and the concept of marality/a sociological study of the novel

to discuss my point, i have collected quite a number of articles, anthologies, and books about "wuthering heights" applying various ideas and theories to this fantastic story. hence, i have come to believe that gadamer and jauss are rightful when they claim that "the individaul human mind is the center and origin of all meaning," 3 that reading literature is a reader-oriented activity, that it ...

15 صفحه اول

a comparative pragmatic analysis of the speech act of “disagreement” across english and persian

the speech act of disagreement has been one of the speech acts that has received the least attention in the field of pragmatics. this study investigates the ways power relations, social distance, formality of the context, gender, and language proficiency (for efl learners) influence disagreement and politeness strategies. the participants of the study were 200 male and female native persian s...

15 صفحه اول

A Common Parts-of-Speech Tagset Framework for Indian Languages

We present a universal Parts-of-Speech (POS) tagset framework covering most of the Indian languages (ILs) following the hierarchical and decomposable tagset schema. In spite of significant number of speakers, there is no workable POS tagset and tagger for most ILs, which serve as fundamental building blocks for NLP research. Existing IL POS tagsets are often designed for a specific language; th...

متن کامل

effects of first language on second language writing-a preliminary contrastive rhetoric study of farsi and english

to explore the idea the investingation proposed, aimed at finding whether the performances of the population of iranians students studying english in an efl context are consistent in l1 and l2 writing taks and whether there is a cross-linguistic transfer in this respect. in this regard the subjects were instructed to write four compositions-two in english and two in farsi-which consisted of an ...

15 صفحه اول

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Journal of Applied and Emerging Sciences

سال: 2021

ISSN: ['1814-070X', '2415-2633']

DOI: https://doi.org/10.36785/jaes.111459